ML & Cybersecutity Project
Anomaly Detection in IoT Network Traffic
Developed an anomaly detection pipeline on the CTU-IoT Malware dataset to identify malicious vs. benign network connections. This project demonstrates applying unsupervised learning techniques to cybersecurity data, including feature engineering, preprocessing, and model evaluation.
Key Contributions:
- Exploratory Data Analysis: Inspected 23 features across 23k+ network traffic entries. Explored data types, distributions, and correlations.
- Data Preprocessing:
- Handled missing values and categorical encodings.
- Converted IPs and ports into categorical features.
- Engineered new features (e.g., rolling connection counts over time windows).
- Pipeline Construction: Built preprocessing pipelines with ColumnTransformer and Pipeline to standardize numeric features and encode categorical ones.
- Anomaly Detection Models: Experimented with clustering and unsupervised methods to detect unusual patterns that may indicate malware activity.
- Cybersecurity Application: Interpreted anomalies in the context of malicious traffic detection.
Skills Demonstrated:
- Machine Learning (unsupervised learning, clustering, anomaly detection)
- Feature engineering for network traffic data
- Python ML stack: pandas, scikit-learn, numpy, matplotlib, seaborn
- Cybersecurity analytics (malware/attack traffic detection)
- Model pipeline design & evaluation
Links:
- Rendered Notebook PDF: ML unsupervised learning – anomaly detection pipeline

- CTU-IoT Malware Kaggle Dataset: Malware Detection in Network Traffic Data

- GitHub


Other Projects